Traveling Wave Ion Mobility-Derived Collision Cross Section Database for Plant Specialized Metabolites: An Application to Ventilago harmandiana Pierre

The combination of ion mobility mass spectrometry (IM-MS) and chromatography is a valuable tool for identifying compounds in natural products. In this study, using an ultra-performance liquid chromatography system coupled to a high-resolution quadrupole/traveling wave ion mobility spectrometry/time-of-flight MS (UPLC-TWIMS-QTOF), we have established and validated a comprehensive TWCCSN2 and MS database for 112 plant specialized metabolites. The database included 15 compounds that were isolated and purified in-house and are not commercially available. We obtained accurate m/z, retention times, fragment ions, and TWIMS-derived CCS (TWCCSN2) values for 207 adducts (ESI+ and ESI–). The database included novel 158 TWCCSN2 values from 79 specialized metabolites. In the presence of plant matrix, the CCS measurement was reproducible and robust. Finally, we demonstrated the application of the database to extend the metabolite coverage of Ventilago harmandiana Pierre. In addition to pyranonaphthoquinones, a group of known specialized metabolites in V. harmandiana, we identified flavonoids, xanthone, naphthofuran, and protocatechuic acid for the first time through targeted analysis. Interestingly, further investigation using IM-MS of unknown features suggested the presence of organonitrogen compounds and lipid and lipid-like molecules, which is also reported for the first time. Data are available on the MassIVE (https://massive.ucsd.edu, data set identifier MSV000090213).


■ INTRODUCTION
Plant derived natural products offer a wealth of potential therapeutic candidates.Numerous approved drugs were derived from bioactive metabolites present in plants. 1 These bioactive metabolites�also referred to as specialized metabolites or secondary metabolites�are produced by a specialized metabolism which is not essential for survival but is a metabolic consequence of environmental adaptation. 2 Traditional methodology for drug discovery in plants involves isolation and purification of individual compounds from plant extracts that have exhibited biological activities.This methodology encompasses laborious sample preparation and instrumental analyses [e.g., liquid chromatography (LC), mass spectrometry (MS), nuclear magnetic resonance (NMR) spectroscopy].Over the decades, metabolomics has increasingly been recognized as a valuable technology for research in plants and traditional medicines. 3,4Metabolome analysis can provide crucial information about an organism in response to changes due to internal and external perturbations.It has aided in bioactive compound discovery, allowing large-scale metab-olite profiling which provides molecular descriptors for the unknowns in complex natural product extracts.
Among these analytical platforms, MS is a mainstream technology for targeted and untargeted analyses because of its high sensitivity and selectivity as well as its ability to provide molecular information.Despite its merits, metabolite identification is challenging in MS-based metabolomics.Various technologies and data analysis strategies have been introduced to overcome such challenges, for example, increasing chromatographic separation dimension, 5,6 utilizing highresolution MS or tandem MS, 7,8 and creating open-access mass spectral databases. 9The development of public databases has significantly propelled the application of metabolomics in plant natural products. 10Conventionally, the identification of unknown features is achieved by comparing their accurate m/z values and/or fragment ions (or MS/MS spectra) against mass spectral databases.Despite the advancements in metabolomics technologies, the identification of unknown metabolites in plant extracts still remains a major challenge because they contain a variety of constituents with highly different physicochemical properties and a vast number of isomers.
In recent years, a combination of ion-mobility (IM) and MS (IM-MS) has been increasingly adopted across different research fields of research to improve metabolite identification. 11,12IM provides an additional molecular property known as a rotationally averaged collision cross-section (CCS) value that is specific to a molecule and matrix-independent. 13,14hen combined with chromatography and MS, IM adds another separation dimension, differentiating ions by arrival time, which can improve the quality of mass spectra. 15ncorporating CCS values into the traditional MS-based compound identification workflow helps increase metabolite identification confidence.Many studies have created in-house CCS databases to increase the metabolite identification accuracy as well as coverage in biological samples. 13,16,17urrently, publicly available CCS databases cover a wide range of compound classes, such as plant metabolites, 18,19 pesticides, 14,20 polymers, 19 and pharmaceuticals. 20To facilitate the integration of CCS measurement into the conventional MS-based metabolomics workflows, metabolomics communities have made efforts to create open-access experimental CCS databases, such as the Unified CCS Compendium, 21 AllCCS, 22 CCSbase, 23 and Pacific Northwest National Laboratory, 24 and to develop CCS prediction models. 22,23,25,26Similar to the role of public MS databases, the growing CCS databases will lend support to large-scale metabolite identification.However, a PubMed database search, using keywords "ion mobility mass spectrometry" and "plant", revealed that in the past 10 years (2012−2022) only 6% of the articles concerning IM-MS are related to plants.In response to the demand for experimental CCS values in plant metabolites, we intend to provide another source of experimental TW CCS N2 values, determined by traveling wave ion mobility spectrometry (TWIMS), for specialized metabolites gathered from tropical plants and natural products over the years.
In this study, we aim to (1) develop a method for plant extract analysis using ultra-performance liquid chromatography coupled to a high-resolution quadrupole/TWIMS/time-offlight MS (UPLC-TWIMS-QTOF), (2) establish a comprehensive MS and TWIMS-derived CCS ( TW CCS N2 ) database for 112 specialized metabolites, and (3) apply the developed method and database to identify metabolites in V. harmandiana and extend the search for other specialized metabolites within this plant.This plant is of particular interest because of its wide spectrum of traditional uses.−29 Different species of the genus Ventilago of the Rhamnaceae family are tropical climbers distributed throughout South Asia and South-East Asia. 30,31V. harmandiana is a rare endemic species found only in Thailand.To date, only ten pyranonaphthoquinones (PNQs) and nine anthraquinones (ATQs) have been identified in V. harmandiana, and some of which exhibited anti-inflammatory activities. 28Apart from PNQs and ATQs, no other specialized metabolites have been identified.Therefore, we aimed to apply the established TW CCS N2 database to explore other metabolites.

Plant Samples
The details of plant sampling are provided elsewhere. 32In brief, leaf, root, bark, wood, and heartwood samples of V. harmandiana were collected from Trang Province, Thailand (lat.7°47′12.8′′N, long.99°30′12.8′′E; altitude 104 m a.s.l.).The samples were washed with tap water and air-dried, except for the leaves, which were dried in an oven at 80 °C until a constant weight was achieved, before being ground into powder.
Plant samples were extracted using a previously published method. 32Briefly, 30 mg of powdered sample was extracted with 1 mL of MeOH using ultrasonication at 60 °C for 30 min.The crude extract was transferred to a conical tube and the residue was re-extracted with 1 mL of MeOH.The two extracts were combined and dried using a vacuum concentrator (Labconco, MO, U.S.A.).The dried residue was then reconstituted in 1 mL of MeOH.After filtration with hydrophilic poly(vinylidene fluoride), the extract was diluted 10 times prior to LC-IM-MS analysis.Pooled samples (quality control) were prepared by combining 5 μL of the leaf, wood, bark, root, and heartwood extracts and were distributed throughout the sample batches for LC-IM-MS analysis.
MS conditions were optimized to assist the analysis of the selected specialized metabolites.Capillary voltages were 2.0 kV (negative mode) and 2.5 kV (positive mode).Cone voltages of 20 V (negative mode, ESI − ) and 30 V (positive mode, ESI + ) were applied with a cone gas flow rate of 50 L/h.The desolvation gas (N 2 ) flow rate was 600 L/h, and the temperature was kept at 200 °C.Source temperature was set at 100 °C.MS was operated in ESI − and ESI + modes to collect mass-to-charge (m/z ratio) from 30−1000 Da using a scan time of 0.2 s.IM and MS data were acquired in HDMS E mode using MassLynx version 4.1 (Waters Corporation).In the HDMS E mode, MS collects accurate masses of precursor ions and their respective drift times at low collision energies (0 eV), and accurate masses of product ions at high collision energies (energy ramp from 20 to 40 eV).Argon was used as a collision gas.The IM was operated at wave velocities of 800 m/s (ESI + ) and 1000 m/s (ESI − ), IM wave height of 30 V, and trap bias of 35 V. Nitrogen was used as a buffer gas in the IM chamber at a flow rate of 60 mL/min.Leucine Enkephalin (part no.186006013, Waters, U.S.A.) was used to calibrate mass detection by using a reference ion of m/z 556.2771 (ESI + ) and 554.2620 (ESI − ).The TWIMS-derived drift times were calibrated against the Major Mix IMS/TOF Calibration Kit (part no.186008113, Waters, Wilmslow, UK).All the authentic standards and plant samples were analyzed in triplicates in both ESI − and ESI + modes.

Intraday and Interday Variations and Spiking Experiments
To evaluate intraday and interday variations in CCS measurements, 10 reference standard solutions were spiked into MeOH, yielding a final concentration of 20 μM.Using the method described above, the standard solutions were analyzed in ESI + and ESI − modes in triplicates within a day for three consecutive days.Furthermore, CCS variations were evaluated in the presence of the matrix by spiking 50 reference standard solutions into the pooled plant matrix at 20 μM.The spiked samples were analyzed in triplicates in ESI + and ESI − modes.

Data Processing and Analysis
Chemical classification (kingdom, superclass, class, subclass) of the studied compounds was performed using ClassyFire (Table S1). 37Progenesis QI Informatics (Nonlinear Dynamics, U.K.) was used to process the acquired HDMS E data.To generate the in-house MS and TW CCS N2 database, peak alignment and peak picking were performed before collecting accurate m/z ratio, retention time, TW CCS N2 value, and fragment ions for each reference standard (Tables S1 and S3).
To identify metabolites in plant extracts, raw files were processed in batch in which a QC sample was assigned as a reference for peak alignment.Only those features with an intensity greater than 100 were subjected to metabolite identification.Identified features were classified into four levels based on the Metabolomics Standards Initiative; 38 level 1 metabolites should be validated with at least two orthogonal data of authentic standards.In this study, level 1 metabolites were identified based on the in-house database using the following criteria: mass error <20 ppm, retention time tolerance <0.1 min, matching experimental fragment ions with in-house fragment ions or those given by in-silico fragmentation embedded in Progenesis QI, and percent CCS difference (ΔCCS%) < 4%.Level 2 metabolites were identified by matching experimental m/z of adducts with The Human Metabolome Database (HMDB) 39 and Chemical Entities of Biological Interest (ChEBI) databases, 40 and matching experimental fragment ions with in-silico fragment ions.Level 4 features carry their m/z and TW CCS N2 values but were unidentified based on the selected databases.Level 3 features are putatively characterized compound classes which were beyond the scope of the current study.
Statistical analysis was performed using Microsoft Excel 2016 (Microsoft).Percent difference in the CCS value (ΔCCS %) compared with the in-house database was calculated using the following equation.

CCS Prediction
Predicted CCS values were obtained by entering SMILES structures into AllCCS 22 and CCSbase 23 web interfaces (Tables S1 and S3).For compounds in which All geometrical optimizations were performed using Gaussian 09. 43,44The MerzKollman partial charge and dipole moment were calculated using the pop = (mk,dipole) command. 45The local minima of each optimized structure were confirmed without imaginary vibrational frequencies.The Gaussian output files (.log) were converted into an IMoS input file (.mfj) using a Python script. 46IMoS was used to calculate the average CCS values calculated for each optimized adduct.The

Journal of Proteome Research
CCS calculation was performed using the N2-based trajectory method with Lennard−Jones (TMLJ) parameters and ionquadrupole potential (QPoL) parameters.According IMoS user manual, the following recommended TMLJ parameters were used: number of orientations = 3, total/orientation = 300 000, time step coefficient = 150, diffuse = 1, temperature = 304 K, and pressure = 101 325 Pa.N 2 gas and default values for other molecular parameters were selected. 41RESULTS AND DISCUSSION

Natural Product Database Characteristics
Our in-house MS and TW CCS N2 database of specialized metabolites comprised 112 compounds covering masses from 140 to 640 Da, representing 10 classes: 37 flavonoids, 11 isochromanequinones, 7 linear 1,3-diarylpropanoids, 6 cinnamic acids and derivatives, 18 anthracenes, 12 benzene and substituted derivatives, 10 benzopyrans, 8 prenol lipids, 2 naphthofurans, and 1 organooxygen compound (Figure 1).Almost half of the reference standards (48%) were isolated and purified in-house, and 15 of these were not commercially available (Table S2).S1 and S3).The experimental m/z values were concentrated in the region between 300 and 350 Da, accounting for 66% of the measured adducts; the TW CCS N2 values ranged from 120 to 230 Å 2 .Except for gallic acid, all compounds were detected in the ESI + mode; however, 13 flavonoids, 2 isochromanequinones, and 1 benzene and substituted derivatives were not detected in the ESI − mode.The database included novel 158 TW CCS N2 values from 79 specialized metabolites, some of which were PNQs (classified as isochromanequinones) discovered in V. harmandiana, triterpenoids (classified as prenol lipids) found in Ganoderma lucidum, and naphthofurans found in V. maingayi.
To determine the reproducibility of m/z and CCS measurements, we analyzed all reference standards in triplicate using the ESI-and ESI+ modes.All of the metabolites showed percent relative standard deviations (%RSD) of measured m/z values less than 0.001% (Figure S1A).For CCS measurement, 92% of the adducts demonstrated %RSD of less than 2% (Figure S1B).These results indicate that the developed UPLC-TWIMS-QTOF method is robust and reliable.The %RSD < 2% for CCS measurement was relatively larger than those reported by other studies, typically less than 1%. 14,47The large deviations could be a result of protomer formation, resulting in multiple conformations of ions in the gas phase.With the lowresolution IM, the presence of unresolved protomers could have fluctuated the CCS values.−50 Recently, caffeine and its isomeric metabolites were investigated using ultrahigh resolution cyclic ion mobility. 51As a result, for certain compounds, multiple peaks could be separated using three cycles of IM separation.Our finding calls for cross-laboratory CCS measurements on these plant metabolites�which mostly have never been measured for CCS values�to improve the compound identification using the established database.Taking the experimental CCS data into account, the compound identification was performed using the CCS deviation threshold of <4% in combination with other compound identification criteria including retention time, mass error, and fragment ions.
We further investigated the stability and reproducibility of CCS measurements through intraday and interday experiments on 10 representative metabolites in ESI − and ESI + modes.We chose 5 isomeric pairs to investigate chromatographic, mass, and CCS separations; some of which are present in V. harmandiana.The intraday and interday variations were relatively small, with %RSD ranging from <0.01% to 1.8% and 0.2% to 1.4%, respectively (Table S4).The results suggest that the TW CCS N2 values were robust and reproducible after the samples were stored for 3 days.
Next, we spiked 50 reference standards from different molecular classes into diluted V. harmandiana extract to determine the reducibility of TW CCS N2 values in the presence of the plant matrix.For most of the adducts (92%), deviations from the TW CCS N2 database (ΔCCS% < 2%) were within the observed uncertainty.The deprotonated adduct of questin yielded the highest ΔCCS% value of 3.5% (Table S5) which could have been due to protomer formation. 51Overall, the results demonstrated that the CCS measurement was reproducible and matrix-independent, which is in line with previous studies. 13,14,20mong the 50 reference standards, there were three coeluting isomeric pairs: (1) 4,6,3′,4′-tetrahydroxy-2-methoxybenzophenone and 2,3′,4,5′-tetrahydroxy-6-methoxybenzophenone, (2) garciosone A and 4,3′,4′-trihydroxy-2,6-dime-  S6).Overlaid mobiligrams of the three pairs analyzed in ESI − also showed unresolved peaks (Figure S2).The efficiency of resolving two peaks on IM can be measured by percent difference in CCS of two peaks (ΔCCS%), defined as a ratio of difference in CCS of two peaks divided by their average values. 52In this context, the ΔCCS% values range from 0.61−1.28(ESI + ) and 0.48−0.91(ESI − ).Given the small ΔCCS% and chromatographic coelution, an IM with a higher resolving power (R p ) of at least 130−300 for resolving its isomer peak at half height is required. 52Because the mobility separation of these coeluting isomers is limited when using the TWIMS instrument, which has a CCS-based R p of ∼40−50, 52 they are reported as a sum if present in plant samples.We also calculated the ΔCCS% for the four sets of isomeric pairs spiked in the diluted plant matrix that were accurately identified (Table S6).The ΔCCS% values range from 0 to 0.80 (ESI + ) and 0.30−2.48(ESI − ), which are also extremely small to be baseline separated on TWIMS; however, these isomers were well separated by retention times (Table S6).These results suggest that the combination of chromatography, IM, and MS is indispensable for the analysis of complex samples.

Correlation of m/z and TW CCS N2 Values
Linear correlations between the measured TW CCS N2 and m/z values were observed for both ESI − and ESI + modes, with correlation coefficients (R 2 ) of 0.9554 and 0.9680, respectively (Figure 2, left).A representative compound structure for each class is shown in Figure 2 (right).We observed a small deviation from the trendline for the protonated species of prenol lipids (m/z 440−515 Da) in which the experimental TW CCS N2 values were higher than the trendline.Overall, two polarities showed no distinct trendlines among classes, implying that these small metabolites have equally high levels of mass dependency on CCS.Previous studies have observed distinct trendlines for compound classes with higher masses (500−1000 Da), such as lipids and peptides, while small molecules, including specialized metabolites that occupying lower regions of overlap, appear to have overlapping trendlines. 21,24,47,53However, Belova et al. observed distinct trendlines of perfluoroalkyl carboxylic and acids and polyfluoroalkyl substances, and other environmental contaminants such as bisphenols, organophosphates, and plasticizers. 17o obtain a broader perspective, we overlay the IM-MS associations, categorized into classes, on the Unified CCS Compendium that comprises over 3800 experimental CCS values (Figure S3). 21Our collection of experimental CCS values contributed mostly (84%) to the lower region (140− 400 Da) of the IM-MS conformational space.Separate overlays of compound classes contained in both databases, which are flavonoids, anthracenes, and benzene and substituted derivatives, show that our TW CCS N2 values fit well with the trends deduced from the Unified CCS Compendium (Figure S3, insets).

CCS Value Comparisons
When considering CCS measurement standardization, the comparability of CCS values derived from different instruments, sample treatment protocols, or laboratories is of concern.An interlaboratory study on CCS values of mycotoxins using TWIMS instruments reported the reproducibility of CCS measurement (ΔCCS% < 2%). 54TWIMSderived CCS values were also compared with drift tube-ion mobility MS (DTIMS)-derived CCS values.Although both TWIMS and DTIMS are considered time-dispersive instruments, DTIMS provides a direct measurement of CCS values while TWIMS provides CCS values that are indirectly obtained based on a calibration procedure where a selection of calibrant can contribute to deviations. 47Results from some previous studies supported the comparability of CCS measurements derived by TWIMS and DTIMS, showing small variabilities (∼ΔCCS% < 2%) for compounds such as pesticides, pharmaceuticals, and pesticide metabolites. 20,55−59 Collectively, 35 database entries, mostly flavonoids, cinnamic acids and derivatives, and benzene and substituted derivatives, comprised DT CCS N2 literature values.Of the 35 database entries, 23 and 32 had Δ DT/TW CCS% < 2% and <5%, respectively.Deprotonated trans-cinnamic acid yielded the largest Δ DT/TW CCS% value of 27%.Owing to the large variation from one source of experimental value, we obtained the predicted values for trans-cinnamic acid using CCSbase and AllCCS.Comparing the predicted values with our value, the results yielded ΔCCS% of 3.8% (CCSbase) and 9.5% (AllCCS), demonstrating that the predicted values are closer to our value than is the experimental DT CCS N2 value.However, this large discrepancy requires additional measurements to confirm the absolute value.When the outlier was excluded, the average Δ DT/TW CCS% values were 1.3% (standard deviation, SD, 1.4%) and 2.1% (SD 1.7%) for deprotonated and protonated ions, respectively.Comparisons of TW CCS N2 values between the measured values and literature CCS values were performed on 26 database entries; 17 and 25 entries had Δ TW/TW CCS% < 2% and <5%, respectively (Table S7).The CCS variation was larger for deprotonated ions with an average of 2.8% (SD, 2.3%), but smaller for protonated ions with an average of 1.1% (SD 1.0%).Deviations greater than 5% were observed for deprotonated p-coumaric acid (8.0%) and vanillic acid (5.6%).Our TW CCS N2 values were systematically larger than those reported previously, which were derived using a different calibrant (polyalanine). 16,57,58Variation in TWIMderived CCS values caused by different calibrants has been previously discussed. 60A recent study reported CCS deviation of up to 25% using 11 different calibrants to determine TWIMS-derived CCS values of lipids. 61Additionally, different TWIM-MS settings could cause deviations in CCS values, 62,63 in which, for some compounds, could be larger than a typical ΔCCS% threshold of 2%.
Overall, the percent deviations from DT CCS N2 measurements were within the generally accepted CCS deviation (<2%), with the average Δ DT/TW CCS% of 1.7 (SD, 1.5) (Table S7).For TW CCS N2 measurement, the average Δ TW/TW CCS% was 2.2 (SD, 2.0), slightly higher than the typical threshold.Therefore, the results caution the use of TW CCS N2 libraries created with different calibrants, and attention must be paid to some outliers when applying this database.
One of the major hurdles in metabolomics is the lack of reference standards to confirm the identity of the features detected in biological samples.Although CCS measurement has been introduced to increase the accuracy of metabolite identification, establishing CCS databases has a similar limitation.Considerable effort has been made on building models to predict CCS values, for instance, CCSbase, 23 AllCCS, 22 ISiCLE, 25 and DeepCCS. 26 The plots of the experimental and predicted CCS values exhibited good linear relationships (Figure 3).The slopes were 1.02 and 1.00 for AllCCS and CCSbase, respectively, which were close to a perfect fit (slope = 1), implying the highly predictive performance of both models.The linear fits suggest that AllCCS slightly overestimated the CCS values.When comparing the experimental TW CCS N2 with the predicted values provided by CCSbase, including all adducts, lower deviations were observed; 64% of the metabolites in the database demonstrated ΔCCS% < 2%, but it was 45% for AllCCS.For isochromanequinones which have never been used to train any models, the predicted values given by CCSbase were closer to the experimental values, with an average ΔCCS% of 1.3% compared with 2.8% given by AllCCS.In this case, the higher predictive performance of CCSbase indicates that it is more generalized; nonetheless, it requires a larger data set with more diverse classes to confirm the observations.
Although CCSbase appears to slightly outperform AllCCS for this set of specialized metabolites, the CCSbase-derived CCS of protonated crotepoxide demonstrated the highest deviation of 12% (Figure 3), indicating that the experimental value was much greater than the predicted value.Crotepoxide, which contains cyclohexane diepoxy functionalities, was the only compound in our database with epoxide groups.This finding is not fully understood.Nonetheless, this outlier suggests that the protonation of this molecular structure may not be well captured by the prediction model, possibly due to its highly oxygenated structure in the presence of reactive epoxides that may form different charged isomers than the predicted protonated structure.The crotepoxide reference standard used in this study was isolated from Kaempferia rotunda. 35
The distributions of the identified metabolites (level 1) and their intensities in different parts of the plant are shown in Figure 4 (pie charts).The leaf samples exhibited a distinct metabolite profile due to the enrichment of flavonoid glycosides�quercein-3-O-rutinoside (rutin) and kaempferol-3-O-rutinoside (nicotiflorin).Other parts were enriched in PNQs > ATQs > flavonoids.With regard to PNQs, different distributions were observed across different parts; PNQ-332 was the most abundant PNQ in heartwood and wood, while PNQ-288B and PNQ-302 were most abundant in bark and root, respectively.PNQ-318A, a potent anti-inflammatory metabolite, 27 was the most abundant in heartwood, followed by bark, wood, and roots (Table S8).To the best of our knowledge, this is the first time that flavonoids XAN-330 and NAF-304 and protocatechuic acid have been identified in V. harmandiana.
To further investigate other features through TW CCS N2 data, we focused our analysis on heartwood samples because they contained the highest levels of PNQ-318A (Table S8).We plotted the TW CCS N2 values and m/z values of the protonated features that had ion intensities greater than 1000 (Figure 5).The plot shows that most detected features occupied spaces overlapping with the metabolites present in our database where we observed no class-dependent trends.It is also challenging to assign probable class labels using larger databases because the trendlines and corresponding confidence intervals of small molecules overlap.This may require advanced data mining tools to evaluate different trends.In addition, the intrinsic uncertainty associated with CCS measurements can complicate the identification.However, we observed an interesting IM-MS profile of protonated features lying above the others (circled yellow dots in Figure 5), which demonstrated higher TW CCS N2 values than other detected features with the same masses.The steeper slope suggests that they belonged to other classes of compounds not present in our database, more likely lipid-like compounds with higher TW CCS N2 values due to their less compact structures in the gas phase. 23,47By exploring the trendlines provided on the Unified CCS compendium data, another possible class is organonitrogen compounds that also exhibit a steeper trendline.To explore this hypothesis, we overlaid the IM-MS of the detected features with the lipid and lipid-like molecule, and organonitrogen compound trendlines retrieved from the Unified CCS Compendium. Figure 5 shows that the features with higher TW CCS N2 values are scattered closer to the organonitrogen compound and lipid and lipid-like molecule trendlines.
To obtain candidate metabolites for those features, we performed metabolite identification by matching the experimental m/z and fragment ions with the HMDB and ChEBI databases.Using the SMILES of the candidates, we derived the predicted CCS values of potential candidates using CCSbase and AllCCS.For each feature, the candidate with the lowest difference between the average predicted and experimental values (ΔCCS%) was assigned as a tentative metabolite only if ΔCCS% was less than 4% (Table S9).For example, a feature with m/z 284.2950 and a TW CCS N2 value of 185.97 Å 2 resulted in 11 metabolite candidates.Using SMILES, we obtained the CCS values of the candidates using CCSbase and AllCCS (inset table in Figure 5).This feature was assigned to 1-deoxy-3-dehydrosphinganine which yielded the lowest ΔCCS%.Nonetheless, the accuracy of this metabolite identification approach also depends on the numbers of metabolites listed in the chosen databases.The overall results show that most of the tentative metabolites were fatty acyls, organonitrogen, and organooxygen compounds, which is consistent with the observed TW CCS N2 trend.This demonstrated the potential of IM-MS analysis for identifying new metabolites or classes and minimizing the list of metabolite candidates in untargeted experiments.
We repeated a similar search for other parts of the plant samples to report the abundances of these tentative metabolites across the different parts (Table S9).Some of these were isomers in which our experimental data may be insufficient to provide definite structures.As shown in Table S9, five tentative metabolites were consistently detected across the different parts: hexadecanamide and isomers (1a,b), hexadecasphinganine and an isomer (3a,b), and 1-deoxysphinganine (5) (Table S9).1-Deoxysphiganine is produced via atypical metabolism by mammals and plants. 64,65In humans, the elevated level was linked to many diseases, 66 and it was found to accumulate when mammalian cells were exposed to a mycotoxin fumonisin B1. 67 The presence of 1deoxysphinganine in V. harmandiana may lead to future investigations on microbe−plant interactions and associated metabolites.

■ CONCLUSION
We developed an UPLC-TWIMS-QTOF method to characterize 112 plant specialized metabolites including 15 specialized metabolites that were isolated in-house and not commercially available.The established MS and TW CCS N2 database provided reliable and reproducible m/z values, retention times, fragment ions, and TW CCS N2 values of 207 adducts (ESI + and ESI − ).The database includes novel 158 TW CCS N2 values from 79 specialized metabolites.We demonstrated that CCS measurement is robust and reliable, yielding small variations when performed within 3 days.The development of TW CCS N2 and MS database is ongoing, and we continue to collect data on specialized metabolites from tropical plants.
Regarding the analysis of isomeric compounds, most of the isomeric compounds could be separated by retention times, highlighting the importance of combining chromatography, IM, and MS for the analysis of complex samples.Separating the coeluting isomers with a small ΔCCS% could be restricted by the resolving power of TWIMS.The IM-MS of the specialized metabolites exhibited a good linear relationship; however, we observed no distinct trend among classes.This implied that these specialized metabolites have equally high levels of mass dependency on CCS.Comparability among TW CCS N2 and DT CCS N2 values are observed, exhibiting small percent deviations (∼ΔCCS% < 2%) for the majority of the selected database entries.For this set of specialized metabolites, CCSbase provided lower prediction errors than AllCCS.However, the highest CCS prediction error for crotepoxide by CCSbase prompted future investigations.To validate the established database, we extended metabolite coverage of V. harmandiana.The identified metabolites demonstrated relatively low average mass error (2.2 ± 2.6 ppm) and ΔCCS% (0.9% ± 0.8%).In addition to PNQs, which are important metabolites in V. harmandiana, we were able to identify flavonoids, xanthone, naphthofuran, and protocatechuic acid for the first time through targeted analysis.A distinct IM-MS profile of a group of features suggested the presence of organonitrogen compounds and lipid and lipid-like molecules.
Other percent differences in the CCS values of two compounds of interest (ΔCCS%) were calculated from the equation below.

Figure 1 .
Figure 1.Distribution of compound classes included in the study.

Figure 2 .
Figure 2. (Left) Correlations of measured m/z and TWIM-derived CCS values of 112 plant metabolites analyzed in ESI-and ESI+ modes.Dotted lines represent linear fits to the data.(Right) Chemical structures of representative compounds for all classes.
Among them, AllCCS and CCSbase provide a web-based interface and cover diverse molecular classes.In this study, we compared experimental and predicted CCS values obtained by AllCCS and CCSbase.We calculated CCS values using IMoS for compounds for which the CCS values of [M − H − H 2 O] − and [M + H − H 2 O] + adducts were not available in the online databases.
The bark samples contained the highest number of features (1199 [M + H] + and 1460 [M − H] − features), while the leaf samples contained the lowest number (649 [M + H] + and 727 [M − H] − features) (Figure

Figure 5 .
Figure5.Measured CCS vs m/z values of protonated adducts of detected features with intensity higher than 1000 in heartwood samples (yellow), compounds in the in-house database (gray).Circled yellow dots represent the features showing higher CCS values than others and were subjected to tentative metabolite identification.Blue and green trendlines (with 95% confidence interval shown in dotted lines) are obtained by regressing CCS and m/z values of lipid and lipid-like molecules and organonitrogen compounds (Unified CCS Compendium) using a second-order polynomial model.